Indian Standard IS 12736 : 1989 IS0 1831 : 1980 ( Reaffirmed 2003 ) PRINTING SPECIFICATIONS FOR OPTICAL CHARACTER RECOGNITION UDC 681.32754 : 655.24 (gl BIS 1990 BUREAUOF MANAK BHAVAN, INDIAN STANDARDS ZAFAR MARG 9 BAHADUR SHAH NEW DELHI 110002 February 1990 Price Group 13 IS 12736 IS0 : 1999 . 1831 : 1980 Indian Standard PRINTING SPECIFICATIONS FOR OPTICAL CHARACTER RECOGNITION NATIONAL FOREWORD \ This Indian Standard, which is identical with IS0 1831 : 1980 Printing specifications for optical character recognition' issued by the International Organization for Standardization ( IS0 ) was adopted by the Bureau of Indian Standards on 6 February 1989 on the recommendation of the Computers, Business Machines and Calculators Sectional Committee ( LTDC 24 ) and approval of the Electronics and Telecommunicatior, Division Council. In the adopted standard certain terminology and conventions are not identical with those used in Indian Standards; attention is especially drawn to the following: a) Wherever the words `International be read as `Indian Standard'; and Standard' appear referring to this standard, they should in Indian Standards the current b) Comma ( , ) has been used as a decimal marker while practice is to use a point ( . ) as the decimal marker. CROSS REFERENCE International Standard Indian Standard Degree of Correspondence IS0 1073/l : 1976 Alphanumeric character sets for optical recogniCharacter set tion - Part 1: OCR-A - Shapes and dimensions of the printed image IS 12755 ( Part 1 ) : 1989 Alphanumeric character sets for optical recognition: Part 1 Character set OCR-A - Shapes and dimensions of the printed image Identical The Computers, Business Machines and Calculators Sectional Committee has reviewed the provisions of following IS0 Standards and has decided that these are acceptable for use in conjunction with this standard: IS0 216 Writing paper and certain classes of printed matter Trimmed sizes -A and B series. set IS0 1073/2 Alphanumeric character sets for optical recognition OCR-B - Shapes and dimensions of the printed image. IS0 2469 Paper. board and pulps Measurement Part 2: Character of diffuse reflectance factor. Diffuse reflect- IS0 2471 Paper and board ance method. Determination of opacity ( paper backing ) - 1 - lS12736:1989 IS0 1831 :I980 Page 0 Introduction.......................................................... 0.1 0.2 0.3 Interpretation of the International Standard Standard 3 ,. ..... .. ... ..... .. ... ..... .. ... . . . 3 3 4 Use of the International .......... Annexes ................................ 1 2 3 Scope and field of application References ................................ Spectral requirements 3.1 3.2 General .............. ..... .. ... ..... .. ... .. .. . . .. * 4 .. . .. .. 4 4 4 4 ....................... ................................ .......................... for OCR. .. .. .. ... .. ..... .. ... .. . ... . .. .. . . ........... ........... . . . .. . . . . ... .. ... . . .. .. . Spectral bands 4 Paper specifications 4.1 4.2 4.3 4.4 4.5 General ................. . . . . 4 4 4 5 6 ................................ factor Luminous reflectance Dirt in paper R, of paper .... ............................. ........................... of paper. Paper opacity Variation in reflectance . . 6 6 6 5 Characteristics 5.1 5.2 5.3 5.4 General of the printed image. ................................ ranges Print quality tolerance Definition .............. ........ ,. . . . . . 7 7 of character outline limits of parameters Measurements .............. 12 2 IS 12736 : 1989 IS0 1831 : 1980 0 Introduction of this International Standard is to establish the 0.2 Use of the International methods Standard The purpose The measurement and the values of the parameters basis for industry standards for paper and printing to be used in Optical Character Recognition (OCR) systems, particularly for document interchange, and to aid in the implementation and use of such systems. It provides guidance for the identification and measurement of, and given in this standard are intended for use in OCR applications. As a continuous, achieved which rejection number because and of complete fulfilment of these values cannot be of a statistical may which nature to The systems are liable, some occur. are allowed of the deviations and recognition of and both printing substitution characters establishes specifications for their use. for, the relevant parameters and gives rejections substitutions depends on the specific OCR application and shall be agreed upon, in statistical terms, by the user, the supplier(s) of the printing system and the supplier(s) of the recognition the manufacturer system. of the 0.1 Interpretation of the International Standard a prin- In the guarantee of printing systems, A printing system is defined as a single unit comprising ting machine, by the printing system. process). A printing system which paper and inked ribbon (the latter only if required produces is called an OCR printing printing system is given the right to specify the maintenance rate for the printing system and the supplies to be used (for example paper and ribbon). In the guarantee recognition tal conditions of the recognition system, the supplier of the maximum printed material for OCR applications ' system is given the right to specify the environmen(temperature, humidity, illumination, vibrations and electromagnetic The values in this International printed material A, OCR-B) regardless Standard shall apply to OCR fQnt (OCRand The dimensional of the printing system, noise, etc.) and to establish the level of maintenance for the reader. amount of mechanical Statistical sampling plans by inspection of attributes can be and the specific application. optical characteristics quality ranges. Tolerance of the printed image are given for three used to check whether these guarantees provided that these plans are coherent for each parameter. These limits to be Once a sampling subject to specified used in quality control. are being observed, with those normally limits are specified them. at least shall be achieved, kept well within variations of a statistical but all parameters nature deviate are expected from the If some of these parameters plan has been agreed upon, or documents the sample size is limits, then the number and magnitude of these deviations can be reduced by using special precautions, such as a more accurate choice of the OCR printing system components, more frequent maintenance of the printing machine, a reduction of the printing speed, If the performance a shortening of the ribbon life, etc. recognition system is (i.e. the number of characters established by the plan. to be examined) To allow the printing system to be checked, the printed material to be measured methods are given In this International the parameters of and the measurement Standard. of an optical character subject to variations of a statistical nature and if rejections or substitutions of characters within the tolerance limits occur then, again, the number and magnitude be reduced quent maintenance of the recognition of these deviations system, etc. can by using special precautions, such as a more fre- When the recognition system is checked, only printed material meeting the specifications given in this International Standard shall be used, or ~ by agreement be evaluated according tional Standard. representative samples of current material may be used. In the latter case the rejects must to their compliance with this Interna- 3 IS 12736 : 1989 130 1831 : 1980 0.3 Annexes Stan- 3.2 Spectral bands as reference for the Their use and the on paper The annexes are not an integral part of this International dard but give additional information. In this clause, a set of bands is defined paper and printed image specification. measuring reflectance, procedures are specified in the clauses paper opacity and PCS measurement. Table 1 1 This Scope and field of application contains the basic definitions, Peak I Bandwidth International Standard measurement requirements, specifications and recommendaBand tions for OCR paper and print. Three major parameters are covered. These are of a printed document for OCR media nm 425t 5 I nm, 50 % level 50 or less : of the paper to be used; properties of the ink pat6 490 0 460 0 530 I the optical properties the optical and dimensional 490* 5 460 t 5 60 or less 60 or less -I terns forming - OCR characters; related to the position of OCR I 530 570 + 5 f 10 I I 60 or less 100 or less I I the basic requirements on the paper. characters The major factors of each of .these areas pertinent to OCR are identified. Definitions of these items are given and bases for measurements are established. Basic specifications applicable to all OCR materials are imposed and recommendations for the implementation of an OCR system are made. The bands B 425 up to B 900 represent the spectral responses required from the complete measuring instrument (light source, filter, without content detector). secondary These responses and with shall be smooth parts curves of the peaks no major 2 References IS0 216, Writing paper and certain classes of printed matter Trimmed sizes - A and 13 series. IS0 107311. Alphanumeric character set OCR-A sets for optical - response curves beyond the specified 50 % points. The energy of the illumination at wavelengths shorter than 400 nm should not exceed 5 % of that in the particular band under consideration. recog- nition - Part I : Character sions of the printed image. IS0 1073/2, Alphanumeric nition - Part 2 : Character sions of the printed IS0 2469, Paper, factor. image. Shapes and dimen- 4 character sets for optical recogset OCR-B - Shapes and dimen- Paper specifications for OCR 4.1 General should be white (see opacity (see (such as dirt, additives) The papers to be used in OCR applications board and pulps Measurement of diffuse reflectance annex A), have low gloss, and be of high annex A). Factors causing variation in reflectance uneven formation, watermarks and should be avoided. In particular OCR applications, as etc.) stiffness, may some mechanical porosity, tear For both fluorescent IS0 2471, Paper and board backing) Diffuse reflectance Determination method. of opacity (paper properties optical of and and CIE Publication &commendation. 15 (E 1.3.1) 1971 ~ Calorimetry - Official paper (such resistance smoothness, mechanical be important. properties, agreement between users and manufac- turers of OCR systems on the specific papers to be used is ad- 3 3.1 Spectral General requirements visable. 4.2 Luminous reflectance factor K, of paper This clause defines spectral bands of interest for OCR applications. They shall be defined since character readers operate in specific change with Reflectance measurements shall be carried out using a reflectometer as described In IS0 2469, or an instrument calibrated against such a reflectometer. spectral regions and paper and ink characteristics the wavelength considered. Reflectance measurements shall be referred to the perfect reflecting diffuser (100 % reflectance). However, in practice 4 IS 12736 : 1989 IS0 1831 : 1980 barium sulphate (BaSO& may be used instead to give sufficient accuracy. In case of disagreement, the measurements shall be based on the perfect reflecting diffuser. 4.3.1 Method A - Grid assay method 4.3.1.1 Equipment 4.2.1 Definition of R, factor This should consist of the following Grid - : The luminous reflectance R, is the reflectance factor ob- tained from a single sheet of paper using the black backing method, i.e. the sample being measured shall be backed with black having not more than 0,5 % reflectance. The reflectance reflecting factor is the ratio, expressed as a percentage, A frame 1 m x 1 m (3.28 ft x 3.28 ft) divided into 100 squares by fine wire. Working surface To accept paper and frame to allow viewing from a of the radiation reflected diffuser by a body to that reflected by a perfect - under the same conditions. distance Lighting of about 0.5 m (1.64 ft). 4.2.2 Measurement of R, The lighting should be a close approximation to the IEC level of recommended illuminant D 65. The recommended illumination is 750 to 1 500 lx. Cleaner R, shall be measured using a method similar to that described in IS0 below. 2471 but using the appropriate filters as described 4.2.3 Visual spectrum - Soft brush or vacuum cleaner to remove loose dirt or dust from the sample surface. R, shall be greater than 60 % in the range from 425 to 500 nm and greater white, sufficient than 70 % in the range from 5tXl to 700 nm. For coloured papers, it is normally filters. Timer Counter 0 425; To tally the number of squares containing dirt. CIE/Y filter, or any filter peaking between 530 nm and 4.3.1.2 Samples Sampling and test area To indicate 0,5 or 1 min intervals. or slightly but uniformly to measure with the two following - 570 nm and having a bandwidth In case of doubt, measurements in 3.2. paper not greater than 100 nm. should be carried out throughout the visible spectrum using, for example, the filters of a total of 6 m2 (64.58 ft2) shall represent the reel or The reels shall be sampled from the outer at both ends with reel in the full width for mill Sheet stacks shall be B 425 to B 686 described NOTE If medium opacity stack of sheets. reels (sampling manufacturing 6 x 1 m (3.28 ft) samples representing is used, the values for (see 4.4.3.21 end of the preceding R, shall be replaced by 50 % and 60 % respectively. sequence if necessary). with sufficient sampled at 6 positions total area. 4.2.4 When Near infra-red (IR) spectrum is of interest, sheets to make up the the near infra-red R, shall be 4.3.1.3 Procedure not less than 70 % at 900 nm. NOTE If medium opacity paper (see 4.4.3.2) is used, the value for Lay out the sample with the topside uppermost. Remove loose dirt and dust from the surface. K, shall be replaced by 60 %. Place the grid over the sample. 4.3 This Dirt refers in paper Start the timer and scan all the squares in turn in 1 min. Record to relatively non-reflecting foreign particles embedand size of such once only with the counter the number of squares seen to con tain a dirt particle or particles, Repeat the test on the remaining 5 m2 (53.82 ft2) and record as the number of squares containing dirt per 6 m2 (64.58 ft2). This number shall not exceed 200. ded in the sheet. Since the lack of reflectance particles may cause them to be mistaken for inked areas by an OCR scanner, it is important that both their frequency and size should be small. Two methods Method method A of evaluating dirt in paper are described evaluation to be made investigation. shall be according to below. whilst enables a quick NOTE should For comparing be exchanged results for from different between may exceed by comparing or low units, assessed samples calibration groups of observers B is suitable for a more detailed the lighting conditions Observer-to-observer change; observers observers parisons giving should differences can be selected significantly high the variation due to assays and excluding Observer corn For both methods CIE Publication variatron. 15. be made periodically. 5 IS 12736 IS0 : 1989 1831 : 1980 4.3.2 Method B - Dirt count by a count of all Two classes of variations namely : For high opacity paper in paper reflectance are specified, The distribution of the dirt shall be established light-absorbent surface particles above a certain size. A paper type fulfils the requirements of this International Standard when 20 samples 0,l than have an arithmetic mean count of less than greater than greater be 250 dirt particles per m* (10.76 ft*) with a diameter : < 3.5 % of the mean reflectance - standard deviation (see 4.4.3.1); For medium opacity paper < 5 -% of the mean reflectance (see mm fO.004 in) each, and when 19 of these samples have, at 0,2 mm (0.008 in). The samples should preferably the most, 25 particles per m* (10.76 ft*) with diameters equal to 1 m* (10.76 ft*) (1.345 ft*), be evaluated. i.e. size A3, and provide a statistical - standard deviation but may not be less than 0,125 m2 IS0 216. They shall be independent The of the full paper type to 4.4.3.2). specification on variation bands in paper reflectance shall be representation satisfied in the following : I3 425; 4.4 Paper opacity shall be carried out using a reflectometer or an instrument calibrated against B 530 or B 570 or any band peaking in between and having a bandwidth smaller than or equal to JO0 nm (the CIE/Y ment); B900. may usually be limited to the spectral energy distribution satisfies this require- Opacity measurement as described such a reflectometer. in IS0 2469, 4.4.1 Definition of paper opacity In practice the measurements most critical band. Opacity (paper backing) is the ratio, expressed as a percentage, of the luminous factor R, reflectance factor R, of a single sheet of the paper with a black backing to the intrinsic luminous reflectance of the same sample of the paper. (This definition cor2471.) responds to that in IS0 In doubtful cases where a single band measurement may not be sufficient to show that the specification is satisfied throughout the whole spectrum, In addition, not exceed described in with the oprein 3.2. Detailed the three bands shall be used. the ratio of the highest to the lowest value obtained according to the above specification shall 1,2. procedures are laid down in annex A. by the measurements 4.4.2 Measurement of paper opacity The opacity shall be measured using the method IS0 2471. The filter used shall give, in conjunction tical characteristics of the basic instrument, sponse equivalent to the spectral bands described measurement an overall 5 5.1 Characteristics General of the printed image 4.4.3 Classes of opacity In addition to the properties i.e. the of characters. systems quality, of the paper, the properties print quality, are critical print quality and Characters of the in the than printed characters, 4.4.3.1 High opacity paper recognition High opacity paper shall have an opacity greater than 85 %. recognition characters higher 4.4.3.2 Medium Medium opacity paper tained. opacity paper shall have an opacity greater than 70 % Assessment geometry examination 4.5 but less than 85 %. machines print to be read by optical have to be of higher appropriate inks, to be read by the human eye only. To achieve this ribbons operated printing shall be used and adequately and main- of print quality shall include the examination of the of the printed pattern (character shape) as well as the of the intensity of inking on the paper (print conof the ink (spectral response) are also Variation in reflectance of paper trast). The characteristics of importance. The characteristics Reflectance measurements performed with a very small aperture at a number of positions on the paper surface result in a variation of the measurements obtained. described hereafter apply to the printed imtype faces) with age, not to the printing device (for example which the printed image is produced. These variations shall not exceed a given limit. 5.2 Print quality tolerance ranges in a sucof Due to their statistical reflectance nature, the limits for variation measured in paper In general, on the the tolerances on print quality parameters level and on the of cessful OCR system will depend on the reader characteristics, required performance number are defined in terms of the allowable variation coef- ficient of the paper reflectance 0.2 mm (0.008 in) diameter. with an aperture 6 IS 12736 : 1989 IS0 1831 : 1980 characters in the reading repertoire considered. To accommodate these variations in capability of specific categories of printing and reading devices, three ranges of print quality are defined : Print tolerance range X : tight tolerances Print tolerance range Y : medium `tolerances Print tolerance range 2 : wide. tolerances It should be noted that characters in range Z are reaching the limit of good quality print and are likely to give rise to an increased reject rate in many applications. Range Z characters can only be measured successfully by means of computeraided methods (see 5.4.6). character with all the strokes having the respective strokewidth as specified in 5.3.1. A COL gauge is a drawing on a transparent base of the two COLs and the centreline. Rules for the construction of COL gauges are given in 5.3.2 to 5.3.7. 53.1 Nominal strokewidth (see table 2) For COL constructions the following nominal strokewidths and tolerances apply. The heights indicated are exact for OCR-A. For OCR-B they are indicative; exact values shall be measured from the OCR-6 drawings in IS0 1073. For OCR-B, the nominal strokewidth of the small letters and of the characters #, %, @ is 0,31 mm (0.012 in), for size I and 0,44 mm (0.017 in) for size IV. 5.3 Definition of character outline limits The minimum and maximum character outline limits (COL) for a given character, in a specified font, character size and tolerance range, are the outlines of an ideal printed image of such a Table 2 - Nominal strokewidth Tdormca f Height size mm in 0.094 0.126 0.150 0.142 Nominal strokewidth Ran* X mm 0.35 0,38 0,51 0.020 0.13 0.005 0.25 0.010 0.50 in R8ngesY, i! in mm 0,15 0.18 in mm 0,08 0,OS I III OCR-A IV OCR-B 2.40 3.20 3.80 3.60 0.014 0.015 0.003 0.003 0.006 0.007 IS 12736 : 1999 IS0 1931 : 1980 5.3.2 Construction of the COL gauges size and tolerance envelope Likewise centred centred the on range the minimum equal to along the is the the figure 2). An exception to this rule applies if the stroke centre- For a given character COL is the geometric the minimum character geometric maximum character Deviating centreline. envelope centreline. strokewidth line has a corner with an angle of more than 305O. In this case, the external corner of the maximum COL shall be drawn as a tangent to the envelope perpendicular to the bisector (see figure 3). of the corner defined by the stroke centreline of a circle of diameter on and moved maximum a diameter and moved COL strokewidth of a circle with equal to the along 5.3.5.3 Free stroke ends COL shall be squared off by parallel and perpendicular stroke centreline At free stroke ends, the maximum from the general rules, the following rules apply to drawing the tangent free ends of strokes and to corners of the stroke centreline of the gauges. These rules refer to "external" and "internal" corners which are defined An external as follows to the corresponding (see figure 2). to the envelope free end of the character : by the 5.3.6 Letterpress font corner is a corner where the angle defined strokes of the centreline An internal is greater than 180° (see figure 1). the angle defined by the 1). corner is a corner where strokes of the centreline is smaller than 160" (see figure The letterpress font characters of OCR-B may be checked with the same gauges, constructed according to the rules stated above, in range X, size I. Attention ing special features shall be given to the follow- : 5.3.3 Fairing radii fairing radii shall be used as indicated in 5.3.4 and radii are used for the construction of COL gauges. Table 3 5.3.6.1 constant, The nominal strokewidth but mai of the letterpress font is not are 5 % to The following 5.3.5. OCR-A The same fairing and OCR-B deviate from the nominal value of the consvalue and can be neglected. tant strokewidth font in range X. These deviations 10 % of the nominal Fairing rdius, minimum COL Size mm RI in 0.004 0.004 0.005 Fairing radius, motimum COL R, mm 0.10 0.13 0.20 in 0.004 0.005 0.008 5.3.6.2 The nominal stroke outlines of some characters end with sharp corners of considerably less than 9o". At these cotners, the stroke edges may extend outside maximum COL and inside minimum COL. These extensions are allowed if they are set of not obviously due to voids or spots. The latter are subject to the relevant specifications. However, gauges for the letterpress font. there/is no specific I III IV 0.10 0,lO 0.13 5.3.7 gauges 5.3.4 When Special rules for minimum COL presents COL an internal corner with a As Additional for range rules 2 for the construction of COL mentioned in 5.2, characters in range Z can only be (see the minimum measured reliably by means of a computer-aided method 5.4.6). In this case special COL gauges shall be used. radius equal to or less than RI (see 5.3.31, it shall be drawn with a sharp corner defined point where by the tangents from to the envelope greater to equal at the to or Printed defined machines images that do not fulfil the shape from the requirements are within known COL is restricted requirements as the radius changes for range X and range Y may be recognized as deviations that these deviations repertoire of most commonly minimum by OCR smaller than R, (see figure 21. given for range certain limits and sub-sets. are asymof the Y, provided 5.3.5 Special rules for maximum COL that the character The deviations metrical character violations to numeric in practice on one side 5.3.5.1 When Internal corner COL presents a sharp internal corner or a (at the top or at the bottom or on the right side or on the maximum the left side) called cut-off. Such deviations may happen for example with high speed printers (see figure 4). radius smaller than R2 (see 5.3.31, a fairing radius equal to R2 The limit for the allowed cut-off shall be given by cut off limit limit lines define a rectangle for a given font and for a shall be given by lines (see figure 5). The cut-off shall be used (see figure 2). 5.3.5.2 External corner which is of equal size for all characters given size. The dimensions of this rectangle When the centreline has a sharp corner, the external corner of the maximum COL shall be drawn as a sharp corner also (see the horizontal and vertical dimensions of the largest character measured along the character-centreline. 8 The sizes. dimensions are given below of the different fonts and External corner Table 4 Hri*t Font Size mm A. 6 A. 6 in a094 0.126 mm 1.40 1,52 Width in 0.055 0.060 I III 2.40 3.20 NOTE - Characters with the minimum COL's within the rectangle defined above for the cutoff limit lines shall have no cut-off. Internal corner The horizontal position of the rectangle shall be centred on the vertical centreline of the characters of font. A, and centred on the vertical reference line of the characters of font B. Minimum COL The vertical reference shown position shall be defined by the distance for distance d, betcharacter d, are Maximum COL ween the base line of the rectangle line (see figure below. and the horizontal 5). The dimensions Figure 1 Table 5 Internal and external corner of stroke elements Distance d, Font Size mm A I III IV 6 I III IV 0,OO 0,oo 0,oo 0,13 0,18 0.20 in 0.00 0.00 0.00 0.005 0.007 0.00% In the measuring in figure gauge, 6. the cut-off limit lines for each character are shown shall be defined only inside maximum COL. Examples For those stroke elements that are affected centreline is defined as follows : by cut-off, a cut-off \P=5.3.5.2 Figure 2 Special situations at minimum and maximum COL The cut-off centreline is the geometrical locus of all centres of circles that can be drawn between the cut-off limit line and the internal line of the non-vlolated minimum COL. On the intersection between the cut-off limit line and the minimum the cut-off centreline must COL of a fit into the gauge gauge stroke-element, centreline. IS 12736 : 1989 IS0 1831 : 1980 Cut-off 5.3.5.2 J Figure 3 - Special corner at maximum COL Figure 4 - Cut-off character -t --t-+ I I i 1 Vertical reference line* Horizontal reference line, font A Horlrontal reference line, font 6 Figure 5 - Cut-off limit lines 10 IS 12736 : 1383 IS0 1831 : 1980 Figure 6- Examples of gauges with cut-off limit lines 4 Maximum COL - Minimum COL - Cut-off limit line - J Figure 7 a) - Adjustment of a character without consideration of the cut-off limit line Figure 7 b) - Adjustment of the same character consideration of the cut-off limit line with 11 IS 12736 IS0 1831 : 1989 : 1980 5.4 5.4.1 Measurements General recognition (PCS) of parameters 5.4.3.1 print contrast : The difference between the reflec- tance of a character and that of the paper on which it is printed. For machine trast signal of the printed image, the print conhigh, i.e. 5.4.3.2 trast character print divided contrast signal (PCS) : The ratio : print conon which the of all parts shall be sufficiently by the reflectance of the paper above a minimum value. This is necessary for the image to be distinguishable from the background. For optimum reliability of reading, a major part of the character shall have a higher PCS value than the minimum value which the specification allows for any particular also decrease characters small area portion. unevenness Reliability of reading may within the as the of the printing is printed. 5.4.3.3 character much best fit : The position of a COL gauge over a for which the character fills the minimum COL as as possible and at the same time extends COL. as little as increases. possible beyond the maximum 5.4.2 This Measuring International methods Standard describes three measuring 5.4.3.4 print contrast signal within values measured along the centreline. a character : PCS methods : the visual method, the instrumented the computer-aided method, method, 5.4.3.6 PCS,i, 5.4.3.5 character PCS,,, - : Gained from along the centreline. the darkest parts of a : Gained from the lightest parts of a character along the centreline. in increasing order of sophistication. is intended for quick and cursory examinain field applications. Not all parameters can be measured visually. The instrumented a reflectometer, i.e. an instrument able to This second method method program allows for results requires a scanand a computer 5.4.3.9 stroke edges : Defined by the points where the reflectance is approximately halfway between that of the adjacent area of the stroke and that of the background. between 5.4.3.10 edge irregularities ding either within the minimum COL. reliable but requires a certain 5.4.3.8 voids : Areas inside the minimum COL significantly lighter than the rest of the character. which are 5.4.3.7 (CVR) contrast variation ratio within a character The visual method tion of characters defined hereafter method requires measure : The ratio of PCS,,, divided by PCS,,,. print contrast. which are in practice sufficiently ner of high resolution, time to carry out. The computer-aided a specialized for the evaluation of the measurements and for the computation of the parameter values. The results are of high reliability. This third method An effort also requires, of course, time. has been made to achieve close agreement visual, instrumented and computer-aided measurements. Exact correlation is not always possible, and some differences can arise when carrying out measurements. ween two measurement shall be decisive. method lists requirements and values methods, technique In case of conflict betthe more sophisticated : Part of the stroke edge extenCOL or outside the maximum Only the computer-aided for measurements 5.4.3.11 contrast spots : Areas outside the maximum COL, which in print range Z. with the background. 5.4.3 General definitions of parameters of the parameters of the printed are 5.4.4 Visual method Hereafter, general definitions image are given in general terms. More precise definrtions 5.4.4.1 Apparatus apparatus shzll consist of a set of COL gauges repertoire under consideration a magni(for example, given for each of the measuring methods together description of the measuring procedures. It should that the parameters PCS PCS within a character PCSrn,x pcsrnin CVR with the be noted : The measuring corresponding fying glass). to the character and of an appropriate optical magnifier 5.4.4.2 Print contrast in reflectance between that of the paper on itself. PC is the difference cannot be measured with the visual method. which the character is printed and that of the character 12 IS 12736 : 1989 IS0 1831 : 1980 5.4.4.3 Best fit 5.4.4.4 Voids (see figure 9) Best fit shall be obtained visually by moving the gauge over the character to be investigated. The best fit position is that for which the character fills the minimum COL as much as possible and at the same time extends as little as possible beyond the maximum COL. Voids are areas inside the minimum COL which have a significantly lower density than the printed image. The distinction between allowable and non-allowable voids shall be based on a measurement of their size and distance. One or more voids shall be allowable if contained entirely in an inspection circle of 0,2 mm fO.009 in) diameter and if their total surface is smaller than one-third of the surface of the inspection circle. If their total surface is greater than one-third of that of the inspection circle but is contained entirely within the inspection circle, then the distance between the centre of this circle and the centre of the inspection circle (0,2 mm; 0.003 in diameter) covering the nearest void or group of voids likewise having a total surface greater than one-third of that of its circle, shall be at least 1 mm (0.04 in). Maximum Minimum COL COL Figure 8 - Gauge in its "best fit" position Allowable voids (> l/3 circle area) Non-allowable void 7 Maximum COL 1; >lmm J L Voids Minimum COL Voids allowed in unlimited L_ number (< l/3 circle area) Figure 9 - 13 IS 12739 : 1999 IS0 1931: 1980 5.4.4.5 Edge irregularity (see figure 10) extends outside is missshall be along the printed image, or may be free standing within the clear area (see 6.10). When the measuring gauge is in best fit to a character, any extraneous ink outside the maximum COL is a spot. Any extraneous ink of sufficient area, that is nearly as dark or darker than the lightest printing within the minimum COL, shall be checked. One or more spots shall be allowable inspection surface is smaller than one-third if contained entirely in an An edge irregularity the maximum allowed exists where the character COL. An edge COL and/or where a part of the character irregularity part of the character measured ing inside the minimum if the projecting the maximum measured (0.012 in). regularities centre. COL and/or the missing part ot the character COL does not exceed 0,3 mm distance between adjacent ircentre to the circle of 0,2 mm (0.008 in) diameter and if their total along the minimum Furthermore, shall be at least 1 mm (0.04 in) measured of the surface of the circle. of that of the insthe inspection If their total surface is greater than one-third 5.4.4.6 Spots (see figure 11) pection circle but is contained between entirely circle, then the distance Spots are areas outside the maximum COL which contrast with the background. The distinction between allowable spots and non-allowable and distance. spots is based on a measurement of their size Spots may be connected or adjacent to parts of covering within the centre of this circle and the centre of the inspection circle (0,2 mm; 0.006 in diameter) of that of its circle, shall be the nearest spot or group of spots likewise having a total surface greater than one-third at least 1 mm (0.04 in). Oimensions in millimetres Allowable edge irregularity Maxlmum COL Minimum COL Maxlmum COL L Non allowable edge Irregularity Figure 10 ~ Edge irregularity 14 IS 12736 : 1989 IS0 1831 : 1980 r Spots allowed in unlimited number (< l/3 circle area) Allowable soots (> l/3 circlk area) / on-allowable spots 0.2 mm circle Maximum COL Figure 11 - Spots 5.4.5 Instrumented measurement measuring by twice character printed images, the area of interest shall be a rectwice the nominal character character width and centred height on the the nominal tangle of approximately 5.4.5.1 Apparatus arrangement being measured.) from a small measurement area Illumination : lamp. RP is the reflectance centred on point p. Incandescent Geometry of illumination : Illuminated The reflectances R, and R, shall be measured within an area of 0,2 mm (0.008 in) diameter, if circular, or 0,15 mm (0.005 9 in) side, if square. These tance, reflectance specifications reflected light. of R, and R, shall be referred to deal only with diffuse reflecaperture. One source at 45O with respect to paper surface. area large compared Geometry of scanner respect with measuring : to paper surface. Aperture 0,2 mm and the reflected light used for measurement shall ex- 90° with clude specularly Reflectance (0.008 in1 diameter Spectral response : at sample surface. measurements 13aS04 as the 100 % value when determining The reflectance measurements The backing method. the value of PC. image is shall be made using the black properties of the ink used to See 3.2. White reference : PC of any point of a printed highly dependent on the spectral create the printed image. See 4.2. 5.4.5.3 5.4.5.2 Print contrast (see figure 12) the reflectance Print contrast signal is defined by The print contrast signal (PCS) : Print contrast character printed. PC = R, where is the difference between R, of a it is PCS = and the reflectance R, of the paper on which Rw - R, KV ~ R, It relates the print contrast of any selected point to the reflec it is printed. Although normally tance of the paper on which reflectance values are referred to BaS04 as the 100 % value, this IS not necessary in the determination of PCS. The value of PCS is dependent and R,. only on the relative reflectance values of K, RVV is the maximum reflectance found within the area of interest to which the PC of point p is referenced. (In 15 IS 12736 : 1989 IS0 1831 : 1980 Dimensions in millimetres -R, -4 0.2 I? Base 1 I 1 line --- Figure 12 -- Print contrast 16 IS 12736 : 1989 IS0 1831 : 1980 5.4.5.4 Best fit 5.4.5.6 PCS,,, All measurements described hereafter shall be made in the "best fit" position of the character with the COL masks. The best fit can be achieved visually on an instrument by positioning the actual character image so that it fills the minimum COL as much as possible and at the same time does not extend beyond maximum COL. More specifically, the overall reflectance within minimum COL shall be a minimum. If this condition is met for several positions, then the best fit position is that yielding the maximum reflectance outside maximum COL. Light portions of the character inside minimum COL, and dark portions outside maximum COL, shall be checked as to edge irregularities, voids and spots. is the highest average PCS value of three consecutive basic PCS values for characters with a centreline longer than 2 mm (0.08 in) and of five such consecutive values for characters with a centreline of less than 2 mm (0.08 in). pc%ax 5.4.5.7 PCS,i, PCS,r, is the lowest average PCS value of three consecutive basic PCS values for characters with a centreline longer than 2 mm (0.08 in) and of five such consecutive values for characters with a centreline of less than 2 mm (0.08 in). 5.4.5.8 Contrast variation ratio within a character within a character is defined by the The variation of contrast ratio Maximum COL Minimum COL contrast variation : CVR =- =%l,, f=%l," conditions The CVR must satisfy the following CVR CVR < 1.50 in range X < 1.75 in range Y : Figure 13 - Gauge in its "best fit" position 5.4.5.9 Voids are areas inside the minimum COL which have a 5.4.5.5 Print contrast signal within a character Voids significantly lower density than the printed image. The distincallowable providing voids and non-allowable certain conditions voids is based larger of their size and distance. Small voids shall 5.4.5.5.1 Basic values described hereafter shall be derived from a set as follows tion between be permitted on a measurement Most parameters of basic PCS values obtained : are satisfied; ones shall not. The size of a void depends measured. conditions on the PCS level at which it is Place a gauge for the range required on the character to be measured; this gauge bears the minimum COL, the maximum COL and the centreline. Move an aperture of 0,2 mm (0.008 in) diameter along the A void shall be.allowable if it satisfies the following : All basic PCS values lower than PCS80 % shall be considered. The values d mentioned d = 0.40 In range X hereafter are whole centreline of the gauge in steps of 0,l mm (0.004 in). All PCS values obtained shall be recorded in the sequence they have been measured. than 2 mm If the length of the centreline is shorter with (0.08 in), the measurements shall be made : steps of 0,05 mm (0.002 in). d = 0.35 in range Y 5.4.5.5.2 PCSw ?(, value of the highest 80 % basic PCS values is A void is present at points for which a PCS of less than (/ is measured. a) Characters (0.08 in) : with a centreline of more than 2 mm The smallest called PCS80 yO. It shall satisfy the following PCS, PCSm conditions : 0/Ob 0,60 in range X % > 0.50 in range Y if a point has a PCS < d and if both adjacent points have a PCS > d, an allowable void is present at this point; For certain OCR applications, the value given for PCSm y0 in range Y might be too stringent. Deviations from this value shall be agreed upon by the parties concerned. - if two adjacent points have a PCS i d, an allowable void is present only if the distance to the next similar pair of points is at least 11 steps; 17 IS 12739 : 1999 . IS0 1931 : 1990 three or more consecutive points with a PCS < d define a non-allowable void. b) Characters with (0.08 in) : a centreline of less than 2 mm The values e mentioned hereafter are : e = 0,65*PCS,i, in range X e = O,;IO.PCS,r, in range Y After measurement of the nine positions mentioned : single points or pairs of two consecutive points having a PCS < d define allowable voids; groups of three or four consecutive points having a PCS < d define an allowable void only if the distance to the next similar group of points is at least 21 steps; groups of five or more consecutive points with a PCS < d define a non-allowable void. - if at least three positions have a PCS > e, the spot shall not be allowable; if at most one position has a PCS > e, the spot shall be allowable; if two positions have a PCS > e, the aperture shall be centred on the position with the smaller PCS and the seven remaining positions defined by steps of 0,l mm fO.004 in) horizontally and vertically are also measured : if a third position is found with a PCS z e, the spot shall not be allowable; if no third position with a PCS > e is found, the spot shall be allowable only if its distance to a spot of the same type and to the maximum COL is greater than 1 mm (0.04 inl. If in this procedure one or more positions happen to have their centre within maximum COL, these positions shall be disregarded. Spots remote from the character, i.e. outside the area of interest, are not subject to PCS limitations. However, if located in the clear area (see 6.10) their size shall be limited to 0.2 mm (0.006 in) in diameter. 5.4.5.10 Stroke edge 5.4.5.10.1 PCS average The PCS average is the arithmetic mean of the highest 80 % basic PCS values. (Not to be confused with PCS80 %.I 5.4.5.10.2 Inspection of the stroke edges The stroke edges shall be considered within specification if, when moving an aperture of 0.2 mm (0.008 in) diameter in steps of 0,2 mm fO.008 in) along the minimum COL and then along the maximum COL, the values obtained along the minimum COL are always greater than 0,5.(PCS,,,) and those obtained along the maximum COL are always less than O,b(PCS,,). However, if 0,5.(PCS,,c) is smaller than 0.3, the inspection of the stroke edges shall be performed with this expression replaced by the fixed value 0.3. See annex El. If these conditions are not met and the stroke edges exceed one or both COL's then the character shall be checked with regard to edge irregularities. 5.4.5 Computer-aided method An implementation of this method exists and is described in annex C. 5.4.5.11 Edge irregularities 5.4.5.1 Apparatus arrangement An edge irregularity is a point for which the measurements described in 5.4.5.10.2 produce a value which is either less than 0,5.(PCS,,,) along the minimum COL or greater than 0,5.(PCS,,,) along the maximum COL. An edge irregularity is allowable only if it is at least 1 mm (0.04 in) from another edge irregularity. The characteristics of the high resolution scanner to be used shall be in accordance with the following arrangement : Spatial resolution : 25 pm (0.001 in) or of higher degree. 5.4.5.12 Spots Aperture : Spots are areas outside the maximum COL which contrast with the background. The distinction between allowable spots and non-allowable spots shall be based on a measurement of their size. Small spots shall be permitted providing certain conditions are satisfied; larger ones shall not. Spots may be connected or adjacent to parts of the printed image, or may be free standing. Spots shall be measured with an aperture of 0,2 mm (0.008 in) diameter, centred on the spot at the point with the highest PCS value. When this position is identified, the eight positions defined by the steps of 0,l mm (0.004 in) horizontally and vertically shall also be measured. 25 urn (0.001 in) in diameter or equivalent to the degree of resolution. Geometry of illumination : One source at 45' with respect to the paper surface with a large illuminated area compared with the measuring aperture. Alternatively, an illumination with a small aperture size at W' with respect to the paper. 18 IS 12736 : 1989 IS0 1831 : 1980 Geometry of scanner : W" with respect to the paper surface. Aperture 0,2 mm (0.006 in) diameter at sample surface. Alternatively, one scanner with large sensitive area at 45' with respect to the paper. Spectral response : See 3.2. White reference : See 4.2. Grey scale resolution : All definitions of print parameters given hereafter are based upon an integration of the scanned values up to a circular area of 0,2 mm (0.005 in) diameter. 5.4.5.2 Print contrast (see table 6) The print contrast is the difference between the reflectance (R,) of the area of the paper on which the character is printed and the reflectance U$,) of the point under inspection. R, is the maximum reflectance of the paper. R, shall be measured in a rectangle 0 which is centred upon the character to be investigated. The dimensions for the rectangle Q are shown in table 6. R, is the reflectance at the point p under consideration. 32 or more grey levels. The results from a measuring arrangement with a large area of illumination and a small scanner area are comparable with those received by an arrangement containing a small area of illumination and a large scanner area. An arrangement with a small area of illumination and a small sized scanner shall not be permitted. If the spectral response of the optical character recognition system in use is known, the parameters of the printed images may be tested in this spectral band only. Otherwise testing shall be performed in all spectral bands that are mentioned in 2.2. While measuring these parameters, the paper shell be backed by a medium with less than 3 % reflectance. 5.4.6.3 Definition of the print contrast signal The PCS is defined by the equation PCS = Rw - R, R, Table 6 - Print contrast Height X width of rectangleQ Font Size Chartiter set mm All characters IS0 A 1073 4`80 560 1 Sub-sets I Sub-sets 0 3, 4 as defined in 3.30 x 2.50 0.154 1, 2 4,90 X 2,70 x 3.40 X 2.50 0.190 0.221 0.170 as defined in 3.90 X 2.50 0.154 in x 0.100 X 0.107 x 0.134 x 0.100 x 0.100 III IV [ III All characters IS0 1073 19 IS 12736 : 1989 IS0 1831 : 1980 5.4.6.4 5.4.6.4.1 Best fit General and investigations on parameters (best fit). that are the 5.4.6.6 Contrast variation within within a character a character is defined by the The variation contrast of contrast ratio variation : All measurements specified in the following shall be performed after centring CVR =- pmll,, pCsrnin COC gauge upon the printed character Definition of best fit : The COL gauge shall be adjusted to be reference edge. the preliminary For this reason CVR shall satisfy the following CVR CVR CVR < 1.50 in range X < 1.75 in range Y . < 2,0 in range Z conditions : either perpendicular For determination the arithmetical than 0.3 within preliminary + 0,3). or parallel to the document of the printed character, shall be defined. 0 shall PCS, stroke edges of the character average the of all PCS's equal to or greater be established. The = 0,5.1PCSI rectangle stroke edges are then found at PCS* 5.4.6.9 The COL gauge shall be moved horizontally the so defined the position preliminary is found where the deviation and vertically along until from the minimum stroke edges of the character Voids Voids are areas inside the minimum COL which are of lower density than the surrounding area. For testing whether a void is allowable 0. Voids shall be allowed satisfied : PCS,i" if one of the following conditions is or not their PCS value shall be determined (see annex CDL and maximum COL is at a minimum. the position with the highest If there are sevrzal such positions, value for PCS& 5.4.6.4.2 % shall be chosen (see 5.4.6.5). with cut-off Characters > 0.40 for range X > 0,35 for range Y > 030 for range Z All characters for all print tolerance ranges shall be centred in_ this way without considering the cut-off limit lines [see figure 7 a)]. For those characters of print tolerance range i! which do not satisfy the conditions of the following specifications, the best fit positioning shall be repeated. all parameters limit lines. This second limit step for best fit shall be performed lines [see figure 7 b)]. Thereafter under consideration 5.4.6.5 of the cut-off by using the cut-off PCS,i" PCS,i, 5.4.6.10 Character shape and strokewidth shall be tested 5.4.6.10.1 Definition of stroke edge the arithmetic average PCS within a character along For the definition of the stroke edges, The smallest value of the 60 % highest values measured the centreline is called PCS, %. conditions PCS:, of all PCS values equal to or greater than PCS& oh measured along the gauge stroke centreline or cut-off stroke centreline shall be determined. The stroke edges are then defined by PCS4 : 0,5.(PCS$if PCS3 > 0,6 if PCS3 < 0.6 It shall satisfy the following : PCS4 = o-3 5.4.6.10.2 Character % in Requirements PCSm % > 0,60 in range X PCSsc % > 0.50 in range Y PCSac % > 0,35 in range Z In certain OCR applications, the values given for PCS, for character shape and strokewidth by apply- shape and strokewidth criteria shall be determined ing the following : without cut-off limit lines, to range Y and range Z might be too stringent. Deviations from these values shall be agreed upon by the parties concerned. If best fit positioning the character 5.4.6.10.1 extend was performed shape as defined by the stroke edges according maximum COL. shall fill the minimum COL and at the same time not 5.4.6.6 PCS,,, beyond PCSnl,, is the highest value that can be found whilst moving the aperture over a distance of 02 mm (0.006 in) along the centreline (see annex C). If best fit positioning was performed with cut-off limit lines, the so defined character shape shall fill the minimum COL Jvith the exception of that stroke element affected by cut-off. The cutoff stroke element shall fill the minimum COL at least up to the cut-off limit line. The character shape may not extend beyond maximum COL. exceptions to the above requirements are defined in 5.4.6.7 PCS,," aperture PCS,i" is the lowest value that can be found whilst moving the over a distance of 0.2 mm (0.006 in) along the Allowable 5.4.6.10.3. centreline (see annex Cl. 20 IS 12736 IS0 1831 : 1989 : 1980 5.4.6.10.3Allowed Occasional violations irregularities of the printed image where k = of the maximum of the cut-off do not exceed limit line and if adjacent irreg- 0,65 for range X 0,70 for range Y 0,75 for range Z Such spots within the rectangle 1 mm (0.04 in) diameter point within Q. whose Q shall be allowable centre if their suron any or groups of violations if the irregularities along the affected (0.03 in) between COL as well as of the minimum limit lines shall be allowed 0,3 mm (0.012 in) measured for a distance two irregularities of 0,7 mm COL respectively faces never cover by more than 10 % the surface of a circle of is positioned ularities pane of the limit lines is violated. shall be measured limit line. If irregularities as on the minimum shall be measured line respectively. The distance between COL as well limit along the corresponding limit line, the distance COL or the cut-off appear on the maximum COL or the cut-off along the minimum 6 6.1 Character General positioning In addition 5.4.6.9) allowable and to the spots above, (see the specifications shall be for voids considered (see for Character each OCR interference positioning character from specifications OCR are needed to ensure that device without non-OCR or from is read by the reading characters 5.4.6.111 irregularities. other NOTE - For purposes where it is sufficient to check printed characters only for their' compliance with this International Standard, it is not necessary to evaluate individual strokewidth values. In case of mass investigations on quantities of printed characters in statistical terms, evaluation of strokewidths is useful. of a printed image is de- elements. This clause contains basic specifications relating to the positioning of characters in a form designed to accommodate general requirements of OCR devices. It does not contain all the rules which may be necessary for a particular application. These additional rules will be the subject of other International Standards. The actual strokewidth measured or cut-off centreline. perpendicularly centreline. of a stroke element fined by the distance between These the stroke edges according to 5.4.6.10.1 measurements shall only be mqde for a on both sides of the gauge stroke centreline distance of up to 0.3 mm 10.012 in) on both sides of the corresponding 6.2 Document reference edges 5.4.6.11 Spots Areas follows located outside the maximum COL but within the rec- A number of specifications in this clause relate to the document vertical edges. reference edges. These can be horizontal and/or tangle Q are spots if their PCS is greater than PCS5 defined as 6.3 Character boundary (see figure 14) that has one con- : k.(PCS,;,) if k'(PCSr&) if k.(PCS,,,) < PCS4 > PCS4 The character boundary is the smallest rectangle reference side parallel to the document tains a character edge and which PCS5 = i PCS, when aligned at the stroke edge (see 5.4.3.9). Character boundaries Figure 14 - Character boundary I Reterence edges 21 IS 12736 IS0 1831 : 1989 : 1980 6.4 The Character skew skew is the rotational deviation of the daries of two characters character separation specified strokewidth within not the same line boundary. be less than the The shall nominal of a character edge. for each size in 5.3.1. printed image from the intended ment reference Character orientation relative to a docu6.7.2 skew shall not exceed 3'. Character spacing of the spacing within a line (see figure 16) is the horizontal distance of the vertical characters 6.5 Liie boundary Character centrelines character boundaries corrected of two A line boundary is the smallest rectangle parallel to the document reference edge which contains all the character boundaries of the characters of the line. within the same line boundary, would exist between characters correction were superimposed by the distance which if the same two position. This and from the spacing these vertical centrelines in their nominal is derived from the nominal drawings references used for the nominal alignment. shall not be.less than : 230 mm (0.09 in) for sizes I and III Character # LLine boundary 3,30 mm (0.13 in) for size IV Two than characters are adjacent if their character spacing is less Figure 15 6.6 Field : 4.60 mm (0.18 in) for sizes I and III A field is a specific portion of a line and comprises at least one character. It may be treated as a unit of information. A line could comprise several fields. Dimensional specifications on Some printing methods and devices such as letterpress, fields do not appear in this International Standard. variable pitch typewriters and some journal tape printers produce printing that does not meet the character spacing specification repertoire 6.7.1 for all combinations Some of characters within the of the printer. OCR scanners can permit this 6.60 mm (0.26 in) for size IV 6.7 line Horizontal positioning of characters within a Character separation within a line exception as long as the character separation requirements of 6.7.1 are satisfied. When considering the installation of an OCR system of this type, close liaison with printer and reader bounmanufacturer is advised. Character ween separation within a line is the horizontal spacing betvertical sides of the character the two adjacent 4 i4-y Centrelines of the character boundaries d0 ! 0 X =- Shift of the nominal position letters capital letter F and capital letter U of capital letter F with respect to other capital (V-X) = Character spacing between I Jl 1 \ j I I i i I Character separation i IP 1;Character ? spacing figure 16 - Character spacing within a line 22 IS 12736 IS0 1831 : 1989 : 1980 6.8 Character alignment within a line 6.8.3 Long vertical mark alignment Character alignment is the vertical distance between the lower side of a character boundary containing one character and the projected lower side of a character boundary containing another character within the same line boundary, corrected by the vertical distance which would exist between the lower side of the character boundaries if the same two characters were superimposed in their nominal position. This definition does not apply to the character LONG VERTICAL MARK (see 6.8.3). The character LONG VERTICAL MARK shall extend beyond the top and the bottom boundaries of any neighbouring character (except for lower case characters with descenders) within the same line. 6.9 Printing area . 6.8.1 Adjacent character alignment (see figure 17) Adjacent character alignment shall be measured according to the above procedure. It shall not exceed : 065 mm (0.026 in) for size I 0.90 mm (0.035 in) for size Ill 1.10 mm (0.043 in) for size IV 6.8.2 Character alignment within a line The printing area is a rectangle that has one side parallel to the document reference edge and is intended to contain only machine-readable characters of one line. The line boundary of a line of printed characters shall be completely inside the printing area. 6.10 Clear area Character alignment within a line shall be measured according to the above procedures. It shall not exceed : 130 mm (0.05 in) for size I 1.60 mm (0.07 in) for size III 2,20 mm (0.08 in) for size IV A clear area is defined as that region of a document reserved for one line of OCR characters and the clear space around these characters. The clear area surrounds the printing area symmetrically. The location and dimensions of clear areas shall be determined by the nature of the individual applications and the requirements specified in this clause. The distances u, b, c and d between the corresponding boundaries of the printing area and the clear area should not be less than 2,5 mm (0.1 in). For readers able to read several lines on the same document simultaneously, a number of clear areas and print areas is defined on the document. For this type of reader, 6.12and 6.13 apply. For two succeeding lines the clear areas of the two lines may overlap (or the clear space between the lines may be shared). ~Character alignment X = Shift of the nominal position of plus with respect to the digits (Y-X) = Character alignment between plus and three Figure 17 - Character alignment within a line 23 IS 12736 IS0 : 1989 1831 : 1980 6.11 Margin (see figure 18) 260 mm (0.08 inl for size IV If character sizes are intermixed, the line separation limitation for any pair of lines shall be that applicable to the largest character in the two lines. The distance between any boundary of the printing area and the nearest parallel paper edge is called the margin. Normally a margin shall be at least 63 mm (0.25 in). Where manually operated serial entry devices (for example typewriters) are used, it is recommended to use top and bottom margins of at least 25.4 mm (1 in). 6.13 Line spacing (see figure 19) 6.12 Line separation Line separation is the vertical distance between the upper side of the line boundary of a line of print, and the lower side of the line boundary of the line of print immediately above. The parameters which influence line separation are line pitch specification, line skew, vertical alignment, character height and strokewidth. The minimum line separation shall not be less than : 065 mm (0.026 in) for size I 150 mm (0.06 in) for size III Line spacing is the vertical distance between the average horizontal centreline position of all characters printed on one line and that of all characters printed on the next line. The line spacing shall not be less than : 4.20 mm (0.16 in) for size I 490 mm (0.19 in) for size III 5.30 mm (0.21 in) for size IV If character sizes are intermixed, the limitation applying to the largest size applies. When lower case size I characters are being used, line spacing shall not be less than 490 mm (0.19 in). fz, b, c, d > 2.5 mm Figure 18 - Margin 2 4 IS 12736 : 1989 IS0 1831 : 1980 Line boundary ---., Average centreline of character boundaries Line spacing Average centreline of character boundaries `-Line boundary Figure 19 - Line spacing 25 IS 12736 IS0 : 1989 1831 : 1980 Annex Paper characteristics A and measurements Standard.) (Not part of this International A.1 A.l.l Spectral properties of spectral properties for OCR documents Typically, these scanners respond to the blue- Significance An OCR scanner will usually be responsive to a restricted band of optical wavelengths. green and green or the near infra-red wavelengths. Therefore, it is a fundamental requirement that the paper used for an OCR document be a good reflector in the wavelength ranges of the optical scanner response. A.l.2 Colour that the paper for an OCR document Consequently be white. White paper is essentially non-selective to wavelengths of It is strongly recommended light within the range of interest for OCR scanners. The specification excludes the use of most coloured if white paper is used no conflict of spectral properties will occur. of colour. paper, especially those with a definite and positive visual indication the OCR area on the documents, If the colour is slight, and essentially uniform throughout specifications on average reflectance. it is possible that they will comply with the A.1.3 A.1.3.1 Notes on measurements Means of realizing B 9CKI the following lamp components may be used To implement Illumination Sensor the B 900 measurements source : : Incandescent : Silicon phototransducer : A low-frequency additives may be unavoidable, for example due to recycling of paper, efforts should be made to minimize this be excluded in the manufacture of paper made for OCR use. materials are added purposes. pass filter with cut-off at about SO0 nm. Glassfilter A.1.3.2 Fluorescent While a low level of fluorescence contamination, and fluorescent additives should generally This is necessary both to avoid difficulties by the user). It is also recognized in reading (with particular equipment) and in sorting (where fluorescent that other readers can tolerate fluorescent additives deliberately included for identification A.2 A.2.1 Paper opacity of paper opacity Significance The opacity is indicative of the change in paper reflectivity on an OCR document due to the backing material at the time of scanning. If the document transport system of the OCR device is such that a known uniform reflective surface is provided at the trme of scanning, a moderately However, opaque paper may be usable. while backed by other printed documents or have a transport system that provides a non- some systems scan the document uniform backing surface. formation. For such cases, a more opaque paper should hc Ilsed, or a higher PCS value should be required for OCR in- IS 12736 : 1989 IS0 1831 : 1980 A.2.2 Recommendations opacity required for an OCR paper will be dependent upon the means of scanning and the application. In general, The minimum opacity is related to the grammage of the paper; the higher the grammage, the greater the opacity. Consequently, relationship between opacity and paper thickness, although the use of filler and coating materials have an effect. there is a similar In general, plication paper having an opacity exceeding 85 % should be used. Papers of lower opacity should be used only if needed for the apPapers having an opacity less than 70 % should not be used. and after considering the scanner optical system. Many inks have the property of permeating the paper to a considerable depth, Applications for this effect. requiring an OCR document to be printed on both sides may require a higher opacity or thicker paper to compensate A.3 Paper gloss A.3.1 Significance ot gloss for OCR documents of a surface responsible for a lustrous or mirror-like appearance. It is a phenomenon related to the specular Gloss is the property reflection of incident light. The effect of gloss is to reflect more of the incident light in a specular manner, and should not be confused with grazing angle specular reflection and to scatter less. It occurs at all angles of incidence that is often referred to as sheen. Paper gloss is undesirable signal. for OCR systems since it will change the effective brightness of the paper, thus affecting the print contrast A.3.2 Recommendations should be restricted to the low gloss varieties. The use of coated or super-calendered papers or other Paper for OCR documents papers with a glossy appearance should be avoided. A.4 Variation in paper reflectance with a very small aperture reflectance measurements at a variety of positions at the paper surface result in a variation of performed visually against measurements gained with a microscopic Reflectance aperture, measurements performed the measures obtained. To distinguish the latter is called Rf. These variations shall not exceed a given limit. of Rf. The average variation in reflectance is defined by the variation coefficient The maximum variation in reflectance, named f, is defined as the ratio of the highest to the lowest value of Rf. A.4.1 Apparatus : arrangement Illumination Incandescent Geometw lamp. / of illumination : Large illuminated area compared with measuring aperture. One source at 45O with respect to the paper surface. Geometry of scanner : Aperture 0,2 mm (0.008 in) diameter at sample surface. 90" with respect to the paper surface. Spectral response : See table 7. 27 IS 12736 : 1969 : 1980 IS0 1831 Table 7 Size Peak "m 425 to 530 to 620 to 460 570 680 Bandwidth nm, 50 % level 30 to 30 to 30 to 60 60 60 Detector I II (III) IV Wide band in the visible spectrum 800 t0 i 000 200 to 400 Silicon White reference : shall be referred to the perfect reflecting diffuser (100 % reflectance). However, in practice barium Reflectance measurements sulphate (BaS04) may be used instead to give sufficient accuracy. In case of disagreement, the measurements shall be based on the perfect reflecting diffuser and the measuring arrangement must be calibrated to the average reading obtained from these measurements. Measurements at present, shall be carried out in the spectral range corresponding to the one employed in the particular reading device (size III is, of minor significance). the specified limits given above for all sizes shall be complied with. in this case. Experience has shown that In all cases where they are not known, compliance with the limits of size IV is sufficient A.42 Requirements in reflectance should be established with a single sample against a black background a maximum of 1 % of all measured (reflectance of this background greater than + The variations should not exceed 3 % 1. The average variation from the measured 3.5 %. In addition, assuming a normal distribution, 0,101, which corresponds to a limit off = 1,20. Both limiting values shall be complied with to A.4.3 mean value Rf shall not result in a variation coefficient values may lie beyond the range Rf (1.00 The mean value of the variation obtained according reflectance factor in 4.2.3 and 4.2.4. may not be less than 5 % below the minimum specified for the luminous A.4.3 Test procedure and evaluation and A.4.3.2 may be selected. Testing of the paper shall be carried out on the top side in both Either of the procedures the machine given in A.4.3.1 and cross direction. A.4.3.1 Measurement at discrete points Measurement of the variation in paper reflectance Kf shall be performed at 200 points over a rectangular area of measurement, 20 mm (0.78 in) by 40 mm (1.57 in) in size. Centres of individual points of measurement shall lie at least 2 mm (0.08 in) apart. The mean value, the standard deviation and the variation coefficient shall be established of the remaining at these 200 points. After discarding 1,2. the highest and lowest values obtained, The evaluation the ratio of the highest-to-lowest values should not exceed at 200 points is sufficient shall be performed only when the samples do not approach to within 10 % of the limit (i.e. a variation coefficient following the of 3,15 % and an extreme-value-ratio above proceoure, of 1.18). Should the samples lie above this limit, then five sets of measurements, (i.e. at least 1 000 points of measurement). The suitability evaluation sets of measures. of a delivery batch may be performed by taking several samples to obtain the required minimum oi the five A.4.3.2 Continuous measurement shall be tested over a rectangular area of Five continuous bands, each 40 mm (1.57 in) long, and spaced 2 mm (0.08 in) apart, measurement 20 mm (0.78 in) by 40 mm (1.57 in) in size. An analogue paragraph graph is obtained from the values and, after the mean value IS evaluated, shall be checked. its compliance with the requirements in the last of A.4.2 28 IS 12736 : 1989 IS0 1831 : 1980 Following this, the highest and lowest values for each band are ignored, and so on, shall be evaluated. procedure of reflectance and the ratios of the remaining are ordered sequentially highest-to-lowest, next- highest-to-next-lowest, largest discarded. Otherwise, Straight-line the following constants The results thus obtained by magnitude, and the five Should the sixth value be less than 1,18, then this suffices for the evaluation shall be carried out. are drawn at reasonable intervals on the reflectance of the variation in paper reflectance. graph, starting at the extreme. For each and the straight line which intersects the curve, the total horizontal reflectance. Should the fluctuation the standard exceeding in reflectance length of the sections which lie above the curve shall be measured, The values thus obtained shall be plotted on a probability the probability ratio of this to the total length of the line shall be calculated. of reflectance, graph against of the varia- be a normal distribution, curve will be a straight line. The mean value deviation and the variation coefficient may be obtained from this graph, and the proportion tion in reflectance the value Rf (1,OO + 0,101 may be determined. of a delivery batch the above total scanned length of 200 mm (7.87 in) is a minimum, and should be com- For the suitability evaluation posed of investigation over 40 mm (1.57 in). of a number of samples, each sample being scanned over at least two bands and each band being measured 29 IS 12736 : 1989 IS0 1831 : 1980 Annex Characteristics 6 image of the printed Standard.) (Not part of this International B.l General specifies the requirements for optimum reading system performance. This Annex The specifications cess. should be met by all print as far as possible in the presence of the random effects which occur in any printing pro- The design of printers and the selection of supplies should assure maximum tions may occasionally reader performance not be met, but the frequency required. compliance with this annex. In any system the specifica- with which this is allowed to occur should be carefully studied in the light of the B.2 Best fit of the best fit allows for its determination will not select identical the values obtained positions. with high accuracy by the instrumented in the selection method. With the visual method, companies dif- The definition ferent operators ferences between In other words, reproducibility Tests have been conducted in which operators of different measured method. the same samples. These tests have shown that slight differences of the best fit position lead only to negligible difis not a critical operation with regard to the from the visual method and those obtained for the same samples by the instrumented of the best fit position by the visual method it appears that the selection of the measurements. 8.3 With Basic values the instrumented (see figure 20) method, most print quality parameters are derived from the PCS basic values measured point selected along its centreline. point. as specified in 5.4.5.5.1. These basic values depend, for each printed character, on the starting The tests men- tioned above have shown that the print quality parameters This is due to the fact that all print quality parameters are not affected by the choice of this starting are obtained as the average of at least three basic PCS values (see for example the highest 50 % and the lowest PCS,i, and PCS,,,). PCSBo % too is not an isolated basic PCS value, but it is a limit value between 20 % of points (in statistical language it is a "quantile") as shown in the figure. Frequency of occurrence A PCS80 % PCS basic values Figure 20 - Basic values 8.4 Spectral bands for PCS of printed information, in PCS, is obtained range of interest. it is necessary that a good contras?exists between the printed image and the paper. Formachine This contrast, absorption recognition expressed in the spectral when the paper has a good reflectance and the print is dense enough to provide a good 30 IS 12736 IS0 : 1989 1831 : 1980 Reading devices usually have a spectral responsf in the visible or the near-M spectrum. A printing ink provides good absorbance in one or both bands, depending on its composition. For example, black pigments tend to absorb light in both bands specified for the visible range as well as that specified for the near-IR, but dyes are more selective and usually yield the best absorption in the visible region. and OCR systems, it is impossible inks would to specify a single spectral range which conbe sufficiently absorbent. Because of the diverse nature of printing equipment tains the spectral responses of all reading devices and in which all printing The choice of which of the three specified spectral bands should be used, therefore, the application concerned. The following considerations apply depends on the reading and printing devices in : it is sufficient to choose the spectral band(s) appropriate to these - if the characteristics of all readers in the system are known, readers; printing which is required to satisfy the PCS specifications of the printing inks; in the visible range imposes the least restriction upon the spectral characteristics the only print which can meet the spectral requirements of all reading systems is that which conforms to the specification in all three bands specified. Print on white paper with ink of a high carbon black content will in general meet this requirement. This consideration also applies in applications where the reading systems to be used are not known when the application . PCS,,, can be approximated by is being defined. B.5 Average PCS (PCS,,,) of stroke edges, For a rapid inspection : pw-m,, + PCS@J y0 2 For a rigorous assessment PCS,,, shall be calculated as indicated in 5.4.5.10.1. B.6 6.6.1 Spots and voids Definition COL and spots outside the maximum COL but close to the A printed image contains, in most cases, voids within the minimum characters. These spots are defined as character-associated spots. 8.6.2 Significance of spots and voids For machine recognition tain minimum spots. of the printed image, it is essential that the print intensity of all parts should be high enough to exceed a cerfrom the background. These requirements are covered by the specifications for voids and value anti be distinguishable 8.6.3 Visual identification and 5.4.4.6 for the identification of spots and voids rely on the observer's estimation of the area the The visual methods defined in 5.4.4.4 and the reflectance contrast of the void or spot. Whilst estimate of the area may readily be made, it is more difficult to assess accurately of the spots and voids. Therefore, great care must be taken In making these visual examinations. B.6.4 Instrumented identification The minimum PCS found within the outline of a character is a measure of the smallest useful srgnal that the character will oroduce in an OCR scanner. If the detection threshold is put above this value, the character will drsplay voids. Because somewhat of the distinction between allowable (small) voids and non-allowable ones, the specification for voids is, in general, allowable. Broadly higher than this rninimum. It is deftned by the PCS threshold d above which all voids are considered speaking, it is a measure of the contrast between the character and Its background. The specificatton for spots likewise, IS not the PCS level at which spots first appear but the threshold level e beyond which they are considered too large to be allowable. It is related to the intensity of background noise in the region of the character. of the different requirements for voids and spots for the print quality ranges X The values d and E have been defined to take account and Y. 31 IS 12736 : 1989 IS0 1831 : 1980 As the incidence of voids increases, the print contrast diminishes until at the limit it is no greater than the level of reflectance irregularities in the paper. A decrease in the incidence of voids will tend to improve reading system performance. This can be achieved, for instance, at some extra cost, by a reduction in the allowed duration of ribbon life. 6.7 Strokewidth ranges The variation in strokewidth from the nominal should be held to a minimum, since generally this could have a bearing on the reader performance. Strokewidth range X requires a high quality printing process and careful control of maintenance and supplies. It cannot be met by some printers in common use for OCR. However, the tolerances which these printers normally produce do not necessarily extend to the full range Y. In such cases, printing performance should not be allowed to degrade beyond the normal level. 8.8 Spots remote from a character The area of interest of a character is defined in 5.4.5.2 and 5.4.6.2 as an area twice the nominal character height by twice the nominal character width and centred on the character being measured. The PCS level and frequency of spots in the area of interest of a character are specified in 5.4.5.12 and 5.4.6.11. character should also be strictly controlled. The size of spots should be minimized; The size and frequency printing smudges of spots remote from.a and regular patterns of dots should be avoided. Many reading operations are started upon detection of the first black point, and if a spot occurs larger than 0,2 mm (0.006 in), then the recognition process may begin. It is advisable that spots greater than 0.2 mm (0.006 in) be prevented from occurring. B.9 Recommendation for lower-case OCR-B characters For the following set of characters, a higher print quality is required, both in terms of PCS and strokewidth. Strokewidth variations should be maintained within range X. abcdefghij opqrstuvwxyz klmn Figure 21 - Recommended lower-case OCR-B characters 32 IS 12736 : 1989 IS0 1831 : 1980 Annex Computer-aided method (CAM) C of print quality Standard.) measurement (Not part of this International C.l Introduction the need to introduce print tolerance system. range 2, it was decided to define a third method of measurement, which uses After recognizing an automatic This system print quality measurement consists of scanning - a high resolution a specialized a computer device for digitizing the printed characters, program to evaluate implementing the rules of the method, parameters defined the print quality by this International Standard. system, described in detail below, has for a Under the sponsorship been developed. complete measurement of the Federal German according government such an automatic measurement depending Using this device it takes approximately to the rules of 5.4.6. 2 to 3 min per character, on the material to be checked, This system is located at the Forschungsinstitut fur Mustererkennung (FIMI, Breslauer Strasse 46, D-7500 Karlsruhe 1, Germany F. R. - the program this program printed written in FORTRAN is available to any person or institution; changes purposes. (if any) to the present International Standard; will be maintained in order to implement - material can be sent iu the FIM for measuring C.2 C.2.1 Scanning General device In order to computerize printed character ing the diffusely reflected the measurement of OCR print quality parameters, it is essential to transform conversion, the optically visible image of a grid, and measurtape. into electrical signals. This is done by illuminating the character, point by point on an orthogonal light at each point. After an analogue-to-digital method of measurement the scanned data are stored on magnetic no suitable high precision scanning At the time the decision in favour of the CAM commercially available. However, to the needs of this OCR print quality standard. was taken, device was a high resolution scanning device which had been constructed for research purposes was adapted C.2.2 Mechanical part on a carriage. (see figure 22). The document to be scanned IS The mechanical part of the device consists of a rotatrng drum and a mirror mounted beam is deflected by a scanning diode which is also mounted on the carnage fixed to the black surface of the drum. The illuminating of the light is collected by the mirror on to the document and the reflected portion To resolve the continuous single pulses per revolution rotation of the drum into srngle points, an incremental of the drum which corresponds distance of 10 urn at the surface of the drum. angle resolver IS used. This resolver produces 43 000 of 0,,008". The drum has a diameter of 137.6 mm to an angular resolution which yields a point-to-point selected point-to-point By takrng every second pulse of the angle resolver, the drstance of 20 pm IS obtained. motor. A single pulse to the motor line separation causes a displacement of the carnage of The carriage 10 urn. Thus, with the mirror is moved by a stepping an input of two pulses at a time yields a scannrng of 20 urn at the surface photographic of the drum. paper and measuring the The precision of the spatial resolution of the scanning distance point-to-point and line-to-line. device was checked by illuminating 33 IS 12736 : 1989 IS0 1831 : 1980 The size of the document displacement height. to be scanned is limited by the facts that up to 4 096 points per revolution size of approximately of the drum can be taken and a 80 mm in width and 240 mm in of the carriage of 240 mm is possible. This allows for a document C.2.3 illumination source because of the intensity of its light and the facility with which it can be deflected, modulated, A laser is used as illumination and focused. The laser beam hits the surface of the drum at an angle of 90' and is focused to a spot of 20 klrn in diameter by appropriate lenses. In order to realize measurements in the different spectral bands as stated in 3.2, three spectral lines of the laser light are provided With : 455 nm blue 515 nm green 633 nm red the knowledge of the spectral responses of the OCR paper and the OCR ink which are continuous evaluation in the spectral bands provided. a single scanning process is sufficient by applying the nearest and unchanging within the limits of 450 nm up to 1 Ooo nm approximately, accuracy by measurements in all spectral bands required is linearly extrapolated with a high degree of Usually, when the spectral band of the OCR reading machine is known, of the three spectral lines of the laser. Otherwise necessan/ to give the precision required. three scanning processes of the same document with all three spectral lines are C.2.4 Scanner diooe A silicon scanner diode with a comparatively large sensitive area is mounted on the carriage at an angle of 45O to the illumination beam, collects a portion of the diffusely reflected light and transforms it into corresponding electrical signals. The output of the diode is digitized via an analogue-to-digital is obtained. converter into 6-bit bytes. Thus, a grey scale resolution of 64 grey levels C.2.5 Calibration measurements must be based on the white reference grey levels ranging reflectance from white with reference mentioned in 3.2. This is implemented the change by means of a standarda for all three All reflectance logarithmic ized grey scale with 20 different function. (paper) to black, where of the grey levels follows The (absolute) to barium sulphate of this grey scale has been measured spectral lines of thelaser on a high precision reflectometer and the values are stored in the computer. be measured, this grey scale is also scanned. Thus, the grey values actually scanned are transformed by means of the known correspondence see C.4.1). are evaluated by interpolation. of values of the grey scale (the transformation is performed This is valid for the 20 grey levels of the grey scale only. The remaining For each series of documents to into absolute reflectance values during the pre-processing phase, of 64 grey levels, grey levels, up to the maximum C-2.6 Storage for scanned data The actual grey values scanned and digitized as 6-bit bytes are transferred to a magnetic tape where they are stored using a storage density of 32 bpmm. Scanning one document of 80 mm in width and 100 mm in height yields one magnetic tape of 720 m filled with data. C.3 C.3.1 Generation Gauges of COL forming gauges matrices in order to obtain the best fit position standardized in IS0 1073. per character is transformed into a is performed by shift operations in The correlation matrices. co-ordinates between a COL gauge and a character of the gauges of all characters of each character Thus the format must each be a matrix. The generation starts with a string of of the bentreline as defined The co~ordinates are given with an accuracy of 1 urn. In the first step this string of co-ordinates matrix-like grid where the distance between any two adjacent points is 20 f.rm. In the second step the preliminary maximum COL and minimum COL are generated by moving circles of 500 urn and 200 urn diameter (for print tolerance range Z and Y respectively) along the co-ordinates of the skeleton. A corresponding procedure applies to the gauges of print tolerance range X. 34 IS 12736 IS0 : 1983 1831 : 1980 In the next step, the fairing radii and other special rules concerning inner and outer corners, the free ends of strokes, etc. are im- plemented separately for all print tolerance ranges. The results of this step are the final COL gauges of ranges X and Y. For tolerance range 2 the cut-off limit lines are superimposed on the final COL gauges of range Y to yield four different gauges per character (i.e. one gauge which is affected by cut-off character, a new gauge stroke centreline off. Not all characters the original undistorted by four COL gauges. are affected at the top, the second affected at the right side, etc.). For each of the four gauges per is built up according to the rules of 5.3.7 for that part of the gauge which is affected by cutare not affected each character at all. In these cases of non-applicability, of print tolerance range 2 is represented by cut-off at four sides; some characters gauge is inserted so that, for the sake of uniformity, C.3.2 When Gauges testing forming strings according to 5.4.6.10, the length of COL violations and the distances between them are stroke edge irregularities measured. For these operations a string-like f,ormat of the COLs is suitable. Thus, for each character, two strings of co-ordinates of the maximum COL and of the minimum COL are generated by extracting the COL co-ordinates of the respective matrix and compiling them into strings. Note that each character of print tolerance range Z yields 8 COL strings. C.4 C.4.1 Pre-processing Automatic of scanned characters location of the characters comes from the magnetic tape where the grey values of the document (white) of paper reflectance by the following scanned point-by- The input to this phase of pre-processing point and line-by-line threshold are stored. Instead, Because of the differing can be given. The first step aims to separate an adaptive threshold the inked parts from the non-inked for each document parts of the document. itself, no fixed grey revel procedure materials used for paper and ink and because of the variation St is evaluated : A histogram the non-inked document of the grey values of all points of the first 100 lines scanned which must be free of ink, is compiled. by only one scanned point per thousand paper It has been found that a suitable threshold St is located at a grey level which is exceeded (these are the dark peaks of : small spots, dirt in paper, etc.). All grey values exceeding St (a typicafvalue are considered to belong to non-inked paper. for St is grey level 15 where grey level 0 stands for white and grey level 63 stands for black) are considered scanned to belong to a printed image; the remaining grid points of the To locate the characters of the first character line printed on the document, the next 250 scan lines following as a matrix on a magnetic to bear character the first 100 scan lines if the threshold St is which have been used for the histogram exceeded are read from the tape and rearranged storage disk. Each scan line of this group is tested with respect to points exceeding grid points). St. A scan line is considered information more than a certain number of times, depending Thus, a binary vector with 250 components on the length of the printed line (for example five times for a line of 4 096 (one for each scan line) is evaluated where the ONES indicate that the cor(see figure 9). responding scan line bears printed character information and the ZEROS indicate that white paper has been scanned The co-ordinates of the beginning and the end of the printed character line can be extracted at once. To find the position of each printed character 250 scan lines. This yields another character can also be extracted (see figure of this character line, a similar procedure where is performed on the columns of the matrix of and the end of each vector with 4 096 components 10). the co-ordinates of the beginning The same procedure applies to all subsequent groups of 250 scan lines until the end of the document is reached. Special attention is necessary to cope with problems caused by the actual length and spacing of the printed lines, the character-misalignment printed line and certain characters like the "equals" sign or "semi-colon", etc. within a C.4.2 Rectangle Q of the scan lines and columns indicate the maximum dimensions of the printed characters of a The components of the vectors character line in the horizontal and vertical directions. given. The centre of the printed character is computed. spection and whose dimensions are given in 5.4.6.2 Thus the co-ordinates of the circumscribing rectangle of each character is The rectangle Q which defines the testing area for each character under tn on the character symmetrically with respect to its centre. IS superimposed C.4.3 Computation of PCS 0 are transforrned in three steps to yield the PCS values. In the first step, the grey values of the scan reflectance by means of the known correspondence with the aperture of the values of the standar The average In the last is centred. of 0.2 mm in diameter. of 5.4.6.3. termed The grey values within rectangle ner output are transformed reflectance into values of absolute dized grey scale, The second transformation step, the PCS value is computed step carries out ?he integration is computed of all points covered by the aperture and assigned to that point on which the aperture to the definition for each point of the 20 urn scanning grid according The result is a matrix PCS matrix, which of 125 PCS values per line and 195 PCS values per column (B font, size I, numeric sub set), subsequently shows a printed character with some whrte paper around it at a resolution of 20 urn. 35 IS 12736 Is0 : 1989 1831 : 1980 C.5 C.5.1 Evaluation Input data of print parameters The evaluation - of the print quality parameters of a character requires the following input data : PCS matrix; class of the character, print tolerance respective respective range; font and size; - - COL gauge as matrix; COLs and COL stroke centrelines as strings. C.5.2 Nothing matrix, Best fit is known as yet about the strokewidth, the stroke edges, the shape, etc. of the character of the COL gauge, the character which is represented by its PCS stroke To find the position with the least violation shape must be defined by preliminary edges. This is done by thresholding ding to the formula PC+ where = O,S.fPCS, + 0.3) the PCS matrix at a certain PCS level, PC&, which has to be computed for each character accor- PCS, is the arithmetic average of all PCS values of the PCS matrix which exceed or are equal to 0.3. Interpreting this formula one should note, that it is widely acknowledged that PCS values less than 0,3 which indicate a very faint inking should not be considered to belong to parts of printed characters, Another inat adding 0,3 to the average value PCS, warrants a threshold PCS2 = 0,3 at least, which is well above of the paper noise. However, the evaluation of that threshold depends on formula is available to define the stroke edge of the printed character. PCS values measured along the stroke centrelines which in turn implies the knowledge of the best fit position. Thus, the computation of the best fit position leads to an iterative procedure which may be very time consuming and does not necessarily converge. The digital&d not covered PCS matrix and the COL gauge matrix are shifted horizontally by the printed image is a minimum. Note that this procedure and vertically against each other until the position is COL for a restricts the skew permitted found where the sum of the area(s) outside the maximum printed character When, to an amount of 0 to 5O, depending COL covered by the printed image and the area(s) inside the minimum of best fit evaluation on its actual stroke width. the This coincidentally, more than one position with the same COL gauge violation exists, a second criterion is used to determine real fit position : for all such positions the PCSsc % value is computed and the position with the highest PCSm O/ is chosen. criterion is deduced from the endeavour to measure the highest possible PCS values along the stroke centrelines. This completes the best fit evaluation for characters of print tolerance range X and Y. Characters of range Z which do not satisfy the conditions of the International Standard after having been centred on the undistorted COL gauge of range Y must be reiterated. This implies centring Standard character. However, this best fit evaluation for characters of range 2 is very time consuming. The procedure can be speeded up by centring the best fit positions and testing the character on all four COL gauges with cut-off on that COL gauge which and testing compliance most exactly with the conditions of this International Stanof that specific for all four centrings. If one of these four tests yields a positive result, the character complies with this International the distortions dard. This will usually be the centring takes into account characters on all four cut-off COL gauges one after the other, choosing the best of the four resulting compliance with the conditions of the International Standard only once. C.5.3 PCS within a character on the gauge which, exp!icitly, is given by the distance of displacement of Having determined the location of best fit of the character the centres of the PCS matrix and the COL gauge matrix, the gauge stroke centrelines are projected into the PCS matrix and measure- ment of PCS values along these centrelines is performed. To check compliance with the PCS 80 96 specification, all PCS values along the centrelines are compiled into a histogram. After deleting the lowest 20 % of all values the lowest value remaining is compared with the respective limit given in 5.4.6.5. 36 IS 12736 IS0 : 1989 1831 : 1980 c.5.4 PCS,,, to determine as follows the highest PCS value within the character while disregarding some very dark peaks. It is the aim of this specification This is implemented by the program : my, the character ZERO of font OCR-A, size I, includes approximately 360 grid points at the raster The gauge strokr centreline for, resolution of 20 pm. Each point of this set is taken as the starting point of a sub-set of points which extend in total to a distance of ex- actly 1 :-nm. Thus, for a straight horizontal ting a part of the centreline highest 20 XI of values, which amount This procedure character. of candidate evaluation segment of the centreline, 50 points of the centreline are assembled to a sub-set represenHaving deleted the for PCS,,,. of the are of exactly 1 mm length. The PCS values of all 50 points are compiled to 10 points in this case, the highest value remaining is repeated for all sub-sets of points which into a histogram. is taken as a candidate can be assembled along the centrelines values for PCS,,, For the ZERO of font OCR-A, approximately 360 such sub-sets can be found. All 360 candidate stored and the highest becomes For characters (cornered) centrelines intersection been without Interpreting with open-ended the final PCS,,,. (free end) stroke cent&lines, with intersections by arrows. ONE of OCR-A indicated the number of sub-sets is smaller than for characters of centrelines, a special supplement is provided. in figure 25. The sub-sets (for example, with closed-end the at the stroke centrelines. following For characters As an example, of the lower part of character the three directions the intersection. this implementation O/O is shown of points must be assembled accordingly. This yields a number of sub-sets which is greater than it would have see figure 25) are treated Other types of intersections of the measurement of centrelines procedure it should be noted : over a distance of 0,2 mm; - that deleting 20 of 1 mm corresponds to moving the measurement aperture that the ambiguities caused by rapid variation of the PCS signal along the centreline are removed. C.5.5 PCS,i, to determine PCS,i, the lowest PCS value within the character is nearly the same as that for PCS,,,. for PCS,i". while disregarding some parts printed most of It is the aim of this specification faintly. The procedure for evaluating The lowest 20 % of any part of the centreline All candidate 1 mm in length are deleted and the lowest value is taken as a candidate becomes the final PCS,i". values are stored and the lo&est one It should be noted that this procedure yields a PCS value which had been called PCS,,id in former standards C.5.6 Contrast variation and PCS,,, is checked as to compliance with 5.4.6.8. The quotient of PCS,,, C.5.7 Voids of PCS,,,,,, includes the processing required for completely testing voids It can be readily shown that the evaluation deleting : of the the lowest 20 % of the values of points which extend to a distance over a distance of 0,2 mm; of of 1 mm corresponds to the movement measuring aperture _. assembling points to form parts of centrelines 1 mm in length corresponds void. to the specification that a void wlthin a character, to be allowable, must be at a distance of at least 1 mm from another A single condition respective remains to be checked for the void specification : voids within the character are permitted if PCS,,, exceeds the PCS limit for voids given in 5.4.6.9. C.5.8 Character PCS, shape and strokewidth within the PCS matrix according to the following function A threshold is set to define the final outlines of the character if PCS, if PCS3 > 0,6 i 0,6 : 0,5.(PCS,), PCS4 = 0,3 where PCS3 is the arithmetic The average value PCS, average of all PCS values greater than or equal to PCS, using the histogram can be readily computed Uhto be measured along the stroke centrelines. which has been compiled for the evaluation of PCS, ,VU 37 IS 12736 : 1989IS0 1831 : 1980 Interpreting this formula it should be noted that : is defined as one half of the value of PCS3 which is generally ac - for average values of PCS3 greater than 0.6, the threshold cepted; for average values of PCS3 less than 0.6, which occur rather frequently = 0,4), halving this value would lead to very low thresholds with characters of tolerance range Z (typical value is PCS3 - which tend to approach the PCS values of the paper noise; printed very faintly. COL completely as to edge the lower limit of PCS4 = 0,3 gives a threshold the PCS matrix in the best fit position, does not extend beyond well above the paper noise for characters the character COL. is checked as to whether After thresholding it fills the minimum and, at the same time, irregularities. For this purpose, taneously checking the maximum If it is not the case, the character is checked the string-type the minimum version of the COLs is used. The specifications given in 5.4.6.10.3 are implemented by simul- COL and the maximum COL at one side of the stroke centrelines to find whether any violation ex- ceeds 0,3 mm or the distance between any two violations is less than 0,7 mm measured from the end of the first to the beginning of the second violation. The different distance between two points of the grid when stepping horizontally or diagonally is taken into account. C.5.9 Actual strokewidth at any point of a printed character The straight PC& "T" and the PCS signals to be measured between along the line P1P2.SK1 and SK2 are the inat that particular part of the line connecting is defined by the distance between those points must be perpendicular two points of the stroke edge on both to the stroke centreline. The stroke The actual strokewidth sides of the stroke centreline. edges are given by the threshold As an example, tersection stroke. The strokewidth figure 12 shows the character points on the stroke edges. The distance the two points is the actual strokewidth is not defined for those points of the stroke centreline where the distance to be measured at one side of the centreline of unrealistic strokewidths is avoided (see points SK, and SK4 in figure 12). or at both sides exceeds 0,3 mm. Thus, the measurement All strokewidths evaluated by this procedure are stored and a histogram is computed. c.5.10 spots on the PCS level at which they are checked. COL is thresholded < PCS4 > PCS4 tolerance range. To determine the size and location of spots, the part of the formula The size of spots depends PCS matrix outside maximum at a certain level of PCS,. The value of PCS, is defined by the following : PCSIj = 1 k'(PCS,i,), PCS4 if /(`(PCS,i,) , if k'fPCS,i,) depending where k is a constant Interpreting - on the respective this formula one should note that : and, as a rule, cause a decrease in machine readability; spots do not belong to correctly printed characters - it is unrealistic to evaluate spots at a PCS level higher than the digitalization level; a threshold when checking them. Therefore, level PCS, because the reading machine also sees here for PC&; them at its specific digitalization the spots being omitted - higher than that will cause a decrease in the size of the spots leading to parts of the upper limit PCS4 is introduced to PCS,,, for smaller values of PCS,,,, the threshold PCS5 for spots is proportional which has been agreed upon previously. After thresholding this purpose, evaluated. the part of the PCS matrix outside maximum COL, the spots remain to be checked with respect to their size. For PCS matrix and the coverage of the circle by spots is of its area. It can readily be shown that this pro- a circle of 1 mm diameter to the following is centred on each point of the digitalized The spots are allowable if the circle is never covered to more than l/IO specifications cedure corresponds : by a distance of less than ~ using a circle of 1 mm diameter ensures that any two or more parts of spots which are separated 1 mm are taken into account simultaneously by summing up their areas; - any extension of a stroke beyond maximum COL is treated similarly; 38 IS 12736 IS0 : 1989 1831 : 1980 - the coverage limit of 1 /lO of the area of the circle corresponds over a distance of 0,2 mm. to the area of a spot which can be covered by moving the aper- ture of 0,2 mm in diameter C.6 Output of the measurement one or more of the following per character Standard; PCS matrices of the character; indicating line printer outputs can be produced measured : or non- Depending - on the user's specifications, one printer line of information with this International the values of the parameters and the compliance compliance - the PCS matrix and the digitalized statistical analysis data if greater quantities of characters have been measured. Laser Laser Lenses Powermeter Scan"er =-LA 77-l--A-_ Scannina area- carriage _y_ f- L/_ Incremental angle resolver .k\ y Figure 22 i Mechanical construction and illumination 39 IS 12736 : 1989 IS0 1831 : 1980 Block 1 2 1 Block 250 . . . 4 096 250 I- Binary vector Figure 23 - Vector of the scan lines Binary vector i Block 1 2 Block 1,2... . .4 096 Component number Figure 24 - Vector of the scan columns 40 IS 12736 : 1989 IS0 1831 : 1980 + `character 4 Character R JJL A Character 1 of stroke centrelines Figure 25 - Intersections PCS PCS, Figure 26 - Actual strokewidth 41 IS 12736 : 1989 IS0 1831 : 1980 Annex Character D positioning Standard.) (Not part of this International D.l Objectives positioning of the character specifications interference (format positioning requirements on a document is "seen" by the Character rules) are needed to ensure that each OCR character or from non-OCR matter. sub-clauses) are to be taken as minimum reading device without tional Standard plemented by further from the other OCR characters in the following The format requirements rules given in this lnternaand may need to be sup- (which are explained rules for specific systems. D.2 Document reference edge for printing and reading the OCR information. One or it may The document more document sometimes used in an OCR system must be moved and suitably positioned edges are used to provide a reference for these operations. to specify one reference the bottom for cheques, and the right-hand horizontal Because of the diverse nature of OCR documents, be convenient edge (for example for journal rolls); for others it may be necessary to specify two edges are usually specified). on a line of OCR characters for this tolerance and a top or bottom reference since edges (for example The tolerance on the distance between the average centreline edge may be vital to the satisfactory system requirements differ widely, functioning of the system. No dimension is given in the specification, but its importance must not be overlooked. D.3 Clear area, printing area and margin OCR printing must be isolated from all other printing or patterns on the document in order to allow the reading,device to distinguish the OCR information more readily. This isolation is provided by maintaining a "border" of blank paper between the OCR information and the remainder of the document. From this arises the distinction between the "printing area", which must include all of the OCR characters and the larger "clear area" which includes the printing area and must be free from any other printing or embossing. If the distance between the boundary of the clear area and that of the printing area approaches the minimum specified, due account must be taken of printing tolerances (vertical misalignment, etc.) and expected paper dimensional changes. It is good practice in document design to provide as generous The boundary a clear area as possible. of the printing area should be kept well within the paper edges, i.e. the margins should be large. This has the advan- tage, among others, that a moderate degree of edge mutilation can take place without impairing readability. There are some special cases however, where the small size of the document may make large margins impracticable and the boundary of the printing area may then have to lie close to the document permissible when it has been established edge(s), for example tape reading. Relaxation of the specification in this respect is only that all OCR devices in the system can handle these documents. edge(s) are important for readers that have limited line- The dimensions of the printing area and its position relative to the document finding capabilities. D.4 Line spacing for multiple-line documents. It is the intention of this International This limitation is necessary in addition height (for example symbols, Standard to limit the number of lines of line separa-In such cases, the line Line spacing is only significant tion, since characters of printing that may occur within a given vertical distance. spacing must be maintained The maximum approximately line packing to the requirements in a line may all be less than full character such as minus). to permit printing of full height characters. density permitted by the tolerances given in the body of the standard for the three character sizes are : Table 8 I Size Lines per 25.4 mm I1 in) I III IV 6 4 3 42 IS 12736 : 1989 IS0 1831 : 1980 However, for these values to be acceptable, the tolerances on the parameters influencing line separation (see D.5) must be below the maxima specified, which apply for wider spacing. (The parameters which influence line separation are : line spacing, vertical misalignment, character height and strokewidth.) In general, line spacing should be kept as large as possible consistent with the other requirements of the system. 0.5 Line separation Line separation defines the isolation required between successive lines of OCR information. Some documents may require and permit closer spacing of lines of OCR information than can be accommodated with the recommended line separation of 2,5 mm (0.1 in). See 0.4. An absolute minimum value of line separation for each of the three character sizes is given. Where this minimum is approached an effort should be made to ensure as large a line separation as possible, by controlling character alignment, character strokewidth, and if possible, line spacing. D.6 Character boundary The character boundary is defined for the actual printed image under examination rather than for an ideal character. This is done in order that the limits assigned to the separation between characters and lines shall be realistic and applicable to any quality of print. 0.7 Character spacing It is the object of the character spacing requirement of the standard to define the lateral relationship of any pair of characters aide by side in the same line in such a way that the maximum and minimum character separation requirements can be met. As mentioned in 6.7.2, the specification on character spacing will not be met when variable pitch or variable set width printing is used (for example variable pitch typewriters, letterpress). Since these types of printing use wide variation in the character width and spacing, they may impose difficulties for OCR devices and special consideration must be given to the compatibility of the print and the reading equipment. D.8 Character separation It is a primary requirement of OCR that characters side by side in the same line shall be isolated by a clearance of unprinted paper. This separation constitutes a vertical band (of width not less than the nominal strokewidth, as defined in 5.3.1) which may not be intruded upon by any part of the character outline. In order to satisfy the minimum character separation requirement, in difficult cases where the nominal character spacing is close to the minimum, the following points need particular attention : - strokewidth variation; character skew; the difference that exists for certain characters between their centreline and the vertical reference line given in the character drawings. For the OCR-B character J (size I), for instance, this distance is as much as 0,18 mm (0.007 in). D.9 Character misalignment The vertical misalignment of characters should be limited to reduce the cost and complexity of OCR devices, to an extent that is compatible with normal and relatively unsophisticated printing equipment. The misalignment may be due to : -. misalignment of individual print faces; ver- misalignment of the document in the printer, causing a complete group of characters printed at one time to be dii tically and/or tilted (skew); local distortion or folding of the document before, during or after printing. 43 IS 12736 : 1989 IS0 1831 : 1980 Clause 6 in this International Standard limits the degree of misalignment of adjacent characters, with an overall limit on the misalignment of any two characters in a line. Misalignment of this kind could be caused by printing fields at different times with different printing devices. It is therefore important to determine the potential misalignment and the requirements for a specific application in order that specifications and contiols can be established. 44 \ Bure!au of Indian BISk promote certification a statutory Standards institution established of under the the Bureau of matters of lndian standardization, in the country. Standards marking Act, 1986 and to harmonious of goods development and attending activities quality to connected Copyright BIS has the copyright any form the course without of all its publications. permission relating in writing to copyright the standard, No part of these of BIS. of necessary details, publications may be reproduced the free and sizes, type (Publications), in or BIS. the prior Enquiries This does not preclude use, in of implementing such as symbols to the Director grade designations. Revision Indian are be addressed of Indian Standards from Standards periodically Users : of and revised, Indian when necessary should and amendments, ascertain that they Standard if any, are in are reviewed time to issued time. Standards possession of the latest amendments reference ) or edition Comments on this Indian may be sent to BIS giving Dot : No. the following LTDC 24 ( 1293 . Amend No. Amendments `. ** Issued Since Publication Date of lsssue Text Affected BUREAU Headquarters Manak Bhavan, OF INDIAN STANDARDS : 9 Bahadur Shah Zafar 331 1375 Marg, New Delhi 110002 Telegrams (Common Telephones : 331 01 31, : Manaksanstha to all Offices) Telephone Regional Central Offices : Bhavan, DELHI 9 Bahadur 110002 VII M, V. I. P. Road, Maniktola Shah Zafar Marg : Manak NEW I 331 01 1375 31 Eastern : l/14 C. I. T. Scheme 700054 CALCUTTA Northern : SC0 36 24 99 35-C, CHANDIGARH 160036 I3 2 1641 1843 445-446, Sector (41 Southern : C. I. T. Campus, 24 42 2'; 169 IV Cross Road, MADRAS 6001 13 144: Western : Manakalaya, BOMBAY E9 MIDC, 400093 Marol, Andheri (East) 6 32 92 95 Branches : AHMADABAD. GUWAHATI. TRIVANDRUM. BANGALORE. HYDERABAD. BHOPAL. JAIPUR. BHUBANESHWAH. PATNA. --__ Printed at Printograph, Delhi, lndla KANPUR.